Intelligibility enhancement of synthetic speech in noise
نویسنده
چکیده
Speech technology can facilitate human-machine interaction and create new communication interfaces. Text-To-Speech (TTS) systems provide speech output for dialogue, notification and reading applications as well as personalized voices for people that have lost the use of their own. TTS systems are built to produce synthetic voices that should sound as natural, expressive and intelligible as possible and if necessary be similar to a particular speaker. Although naturalness is an important requirement, providing the correct information in adverse conditions can be crucial to certain applications. Speech that adapts or reacts to different listening conditions can in turn be more expressive and natural. In this work we focus on enhancing the intelligibility of TTS voices in additive noise. For that we adopt the statistical parametric paradigm for TTS in the shape of a hidden Markov model (HMM-) based speech synthesis system that allows for flexible enhancement strategies. Little is known about which human speech production mechanisms actually increase intelligibility in noise and how the choice of mechanism relates to noise type, so we approached the problem from another perspective: using mathematical models for hearing speech in noise. To find which models are better at predicting intelligibility of TTS in noise we performed listening evaluations to collect subjective intelligibility scores which we then compared to the models’ predictions. In these evaluations we observed that modifications performed on the spectral envelope of speech can increase intelligibility significantly, particularly if the strength of the modification depends on the noise and its level. We used these findings to inform the decision of which of the models to use when automatically modifying the spectral envelope of the speech according to the noise. We devised two methods, both involving cepstral coefficient modifications. The first was applied during extraction while training the acoustic models and the other when generating a voice using pre-trained TTS models. The latter has the advantage of being able to address fluctuating noise. To increase intelligibility of synthetic speech at generation time we proposed a method for Mel cepstral coefficient modification based on the glimpse proportion measure, the most promising of the models of speech intelligibility that we evaluated. An extensive series of listening experiments demonstrated that this method brings significant intelligibility gains to TTS voices while not requiring additional recordings of clear or Lombard speech. To further improve intelligibility we combined our method with noise-independent enhancement approaches based on the acoustics of highly intelligible speech. This combined solution was as effective for stationary noise as for the challenging com-
منابع مشابه
Speech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کاملSpeech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کاملEvaluating speech intelligibility enhancement for HMM-based synthetic speech in noise
It is possible to increase the intelligibility of speech in noise by enhancing the clean speech signal. In this paper we demonstrate the effects of modifying the spectral envelope of synthetic speech according to the environmental noise. To achieve this, we modify Mel cepstral coefficients according to an intelligibility measure that accounts for glimpses of speech in noise: the Glimpse Proport...
متن کاملCombining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise
This paper presents our entry to a speech-in-noise intelligibility enhancement evaluation: the Hurricane Challenge. The system consists of a Text-To-Speech voice manipulated through a combination of enhancement strategies, each of which is known to be individually successful: a perceptually-motivated spectral shaper based on the Glimpse Proportion measure, dynamic range compression, and adaptat...
متن کاملEnergy reallocation strategies for speech enhancement in known noise conditions
Speech output, whether live, recorded or synthetic, is often employed in difficult listening conditions. Context-sensitive speech modifications aim to promote intelligibility while maintaining quality and listener comfort. The current study used objective measures of intelligibility and quality to compare five energy reallocation strategies operating under equal energy and preserved duration co...
متن کاملIntelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion
This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. e present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained n plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech n noise – i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013